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Abstract 

Area openings and closings are morphological filters which efficiently suppress 
impulse noise from an image, by removing small connected components of level sets. 
The problem of an objective choice of threshold for the area remains open. Here, 
a mathematical model for random images will be considered. Under this model, a 
Poisson approximation for the probability of appearance of any local pattern can be 
computed. In particular, the probability of observing a component with size larger 
than k in pure impulse noise has an explicit form. This permits the definition of a 
statistical test on the significance of connected components, thus providing an explicit 
formula for the area threshold of the denoising filter, as a function of the impulse noise 
probability parameter. Finally, using threshold decomposition, a denoising algorithm 
for grey level images is proposed. 
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1 Introduction 



The general problem of image denoising consists of deciding what is the "signal" and should 
be kept, and what is the noise, and must be removed. Many different criteria can be used 
to detect the noise-induced structures. For example, the oscillations due to an additive 
gaussian noise can be measured in terms of the wavelet coefficients. The noise may then 
be removed by a thresholding in the wavelet domain. Donoho and Johnstone |H| gave an 
explicit way to choose the threshold as a function of the variance of the noise. Their claim is 
that "denoising, with high probability, rejects pure noise completely". The underlying idea 
is that in pure noise, all the structures that actually belong to the image could not appear; 
or else, the structures coming from the image itself can be defined as those "objects" 
which would have a very small probability of appearing in a pure noise. This idea was 
implemented in (4] and [E] for the detection of alignments and meaningful level lines in an 
image. 

Here, we shall focus on the size of connected components of the level sets of the image. 
Removing small components is a classical and efficient way of removing impulse noise from 
an image. This method, known as "the grain filter", was first introduced in the framework 
of Mathematical Morphology (THJ by Vincent in (20] as morphological area openings and 
closings (see also [21] and |Hj). This filter is sometimes called the "extrema killer". It was 
then generalized by Masnou and Morel in [12J, and by Monasse and Guichard in [H]. In 
|15j a similar filter was used, in the framework of gradient percolation, for recovering fuzzy 
images. 

But the main question remains: how should the threshold for the area of the components 
that have to be kept, be chosen? A natural idea, imported from statistical inference, 
consists of fixing an a priori risk level e (e.g. e = 0.001), and deciding that anything that 
has probability lower than e of occurring under a pure noise hypothesis cannot come from 
the noise and hence should be kept in the image. Thus for the threshold area, one will 
choose the integer s(n,p,e), such that a connected component of size k ^ s(n,p,e) has 
a probability less than e of appearing in a pure noise image with probability parameter 
p and size n x n. Applying a grain filter with area threshold s(n,p,e) will ensure that, 
with probability larger than 1 — e, pure noise is eliminated. To implement this, one must 
be able to compute the probability of any connected component of size k appearing in a 
pure noise image. An exact computation is not feasible. However an approximation can 
be given if the image is large: our main theoretical result (Theorem 12.41) gives a Poisson 
approximation for the probability of occurrence for any image property which is local in 
the sense that its definition involves only a fixed number of connected pixels. 

Our plan is as follows. Section [2] is devoted to the probabilistic model of noise in binary 
images: all pixels are independent, black with probability p or white with probability 
1— p. The Poisson approximation result will be stated (Theorem I2.4jl and an outline of 
its proof will be given (technical details will be postponed to the Appendix). Section El is 
devoted to applications. We will first explain how Theorem 12.41 together with numerical 
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combinatorial results on square lattice animals 1 , can be used to obtain an explicit formula 
for the size threshold s(n,p,e). An example of denoising for a binary image will be given. 
Then we shall extend the method to grey level images through threshold decomposition: 
the binary image corresponding to each grey level is treated separately, then all denoised 
binary images are recombined. Some experiments and a discussion of the obtained results 
come last. 



2 Probability of a local property 

Our probabilistic model for random images is the following. Let n be a positive integer. 
Consider the pixel set S n = {1, . . . , n} 2 . A binary image of size n is a mapping from S n 
to {0, 1} (black/white). Their set is denoted by E n . It is endowed with the probability 
distribution /i n p defined by: each pixel is black with probability p or white with probability 
1 — p, and all the pixel colors are independent. A random image of size n and probability 
parameter p, denoted by I n ,p, is a random element of E n with probability distribution fi ntP . 

The pixel set S n is embedded in Z 2 and naturally endowed with a graph structure. We 
consider in this paper the case of 4-connectivity (2 horizontal and 2 vertical neighbors). For 
purely technical reasons, it will be convenient that all pixels have the same neighborhood: 
this is why we impose periodic boundary conditions, deciding that (1, j) is a neighbor of 
(n, j) and (j, 1) of (j, n). Thus the graph is a 2-dimensional torus. As usual, the graph 
distance d is defined as the minimal length of a path between two pixels. We shall denote 
by B(x, r) the ball of center x and radius r with respect to the distance d. It is defined by 

B(x,r) = {y E H n ; d(x,y) ^ r} . 

Notice that this ball B(x,r) is diamond-shaped (it is a rhombus) and that for r < n/2, it 
contains 2r 2 + 2r + 1 pixels (see Figure HJ. For the rest of this section, the radius r is a 
fixed integer, and the image size n is larger than 2r + 1. 



Figure 1: Example of an image on the ball B(0, r) with r = 3. This small image is also 
called a pattern. The number of black pixels of this pattern D is b(D) = 2. 

1 Square lattice animals or "polyominoes" are simply defined as connected clusters of squares in the 
plane (for example, the "Tetris" game uses all lattice animals of size 4). 
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The image properties we are interested in are all local, in the sense that they can be 
described inside balls of a fixed radius. All balls are translations of each other. We shall 
choose a ball of radius r, say 5(0, r), and fix a translation t x , from 5(0, r) to B(x,r) for 
all x. We call pattern, and denote by D, an image defined on 5(0, r), and determined 
by its set of black pixels, denoted by [3(D) (see Figure [T] for an example of pattern). Of 
course, 5(0, r) \ [3(D) is the set of white pixels. We shall denote by b(D) the cardinality 
of (3(D) (number of black pixels in the pattern). We shall deal with rather small levels 
of noise, seen as relatively sparse black pixels on a white background. This is of course a 
mere convention: swapping black and white, together with p and 1— p does not change the 
model. Thus, in what follows, we will always assume that p ^ |. 

If D is a pattern on 5(0, r) and r is a translation of pixels, we shall denote by r(D) 
the pattern on 5(r(0),r), whose set of black pixels is t((3(D)). If r(0) = x, we denote by 
D(x) the property: "the restriction of the image to B(x,r) is r(D) n . The property we are 
actually interested in is 

D = (3xe~ n , D(x)) . 
In other words D means: "a copy of pattern D can be found somewhere in the image". 

The patterns D are the building blocks of all local properties. Indeed, there exists only 
a finite number of such patterns (precisely 2 2r + 2r+1 ): let us denote their set by V. Any 
assertion relative to the pixels in 5(0, r) will be called "local": it can be expressed in a 
unique way as a disjunction (logical "or", denoted by V) of distinct patterns. 

The following definitions will be used in the counting of occurrences of a local property 
in an image. 

Definition 2.1 Let ip be a local assertion, relative to the pixels in 5(0, r). 

1. The definition set of if), denoted by T>(ip), is the subset ofV such that 

if)= \/ D. 
DeT>(if>) 

2. The black index b(if)) of if) is the integer b(if)) defined by 

b(ib) = min {b(D)\ . 

3. A meaningful definition set of if), denoted by U (ip), is a subset ofV(if>) such that 

(a) V5 G V (if)) , b(D) = b(if)) , 

(b) Ifr is a translation, then D, D' G V Q (ip) and r((3(D)) = (3(D') imply D = D' , 

(c) D G V(if)) and b(D) = b(if)) imply 3r , 3D' G V (if)) , s. t. r((3(D)) = f3(D') . 

All meaningful definition sets have the same cardinality, which will be called the 
meaningful index of if), and denoted by e(if>). 
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The black index b(ip) is the minimal number of black pixels, in a pattern that satisfies if). 
One can see the meaningful index e(if)) as the maximal number of patterns with exactly 
b{ip) black pixels that satisfy if), up to possible translations. Both will be used to count 
occurrences of the local property based on if). 

Example. Let us illustrate all these definitions by considering a simple example: the 
property "there exist two connected black pixels". On the ball of radius r = 1, the definition 
set of this local assertion if) is composed of all those patterns on B(0, 1) whose center is 
black, and at least one of the 4 neighbors is also black (15 patterns). The black index b(if>) 
of if) equals 2, and its meaningful index is e{ip) = 2 (a possible meaningful definition set is 
made of the two patterns on B(0, 1) such that the center and its right horizontal neighbor, 
resp. its top vertical neighbor, are the only black pixels in the ball B(0, 1)). 

Definition 2.2 Let if) be a local assertion, and if)(x) its localization on the ball centered at 
x : 



Our basic example of a local property if) is: "there exists a connected component of k black 
pixels". A connected component of size k is always included in a ball of radius r ^ k/2. The 
local assertion if) is "there exists a connected component of size k in -6(0, r)". The definition 
set is the set of all patterns on B(0, r) having at least k connected black pixels. The black 
index is the minimal number of black pixels necessary for if) to be satisfied (obviously k in 
our example). The meaningful index is the number of connected components of size k, up 
to translations (see Sectional). 

For a fixed level p with < p < 1, if we let n tend to infinity, by the independence of 
pixels, it is easy to see that asymptotically any pattern will be present in a random image 
with a probability tending to 1 (see [2] for more precise results). Therefore the asymptotic 
probability for the random image X np to satisfy if) is 1, whatever if). That asymptotic 
probability can be different from 1 only if p = p(n) tends to as n tends to infinity. Thus 
our images will have a relatively small proportion of black pixels. 

A classical object of the theory of random graphs (see [TJ [TH] as general references), 
is the notion of threshold function. It describes the appearance of a given subgraph in a 
random graph. The notion of threshold function easily adapts to random images. Let A be 
an image property. The function 9(n) is called a threshold function of A if for p(n) ^1/2 
then 



if)(x) = \J D(x) . 



Deu(v>) 



We call local property based on if), and denote by if) the property 



if) = (Bx , if>(x)) . 



n- 
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and 

T)[ 71 ) 

lim — - = 00=^- lim n n , P (n)(A) = 1 . 

Notice that a threshold function is not unique. For instance if 6{n) is a threshold function 
for A, then so is c6{n) for any positive constant c. It is customary to ignore this and talk 
about "the" threshold function of A. We then have the following lemma. 

Lemma 2.3 The threshold function of the local property ip is n ^ . 

Proof. We shall just give here the main steps, since the detailed proof will appear in P|. 
Let D be a pattern and let X n denote the number of copies of D in the image. Then 

^n, P {n){D) = /i n , p( „)(X„ >0)< E(X n ) = n 2 p(nf D \l-p(n)) 2r2+2r+1 - b ^ ^ n 2 p{n) b ^ . 

On the other hand, let Y n denote the number of copies of D occurring in balls B(x,r) 
where both coordinates of x are multiples of 2r + 1 (which implies that two such balls 
cannot meet). The number of such balls is n 2 where n r = L^rpiJ • Then 

^n,p(n)(X n > 0) ^ Hn,p(n)(Yn > 0) = 1 — ^ n ,p(n)(Y n = 0) 

= 1 - (l -p(nf D \l-p(n)) 2r2+2r+1 - b ^ r 
^ 1 _ exp(-n r 2 p(n) b ^ (1 - p(n)) 2r2+2r+1 - b ^) 
Using these inequalities and the definition of a threshold function, we conclude that 6{n) = 

n -2/b(D) ig 

the threshold function of the property D . To conclude, one has to check that 
the threshold function of a disjunction of patterns is the smallest threshold function of 
these patterns. □ 



Lemma f2. 31 means that the appearance of a local property mainly depends on its black 
index: if p(n) is small compared to n~&, then the probability of any local property that 
needs b black pixels to be satisfied is small. If p(n) is large compared to n~&, then the 
probability is large. The particular case b(if)) = corresponds to the appearance of a white 
ball. If there exists a, with ^ a < 1, such that for all n, we have p(n) ^ a, then the 
probability for a white ball of being present in the random image always tends to 1 as n 
tends to infinity: there is no threshold function. From now on, we will always assume that 
the black index of ip is positive. 

Lemma E31 suggests that the correct scaling for p(n) when one studies a local property 

~ 2 ~ 

ip is p(n) = cn ^ . Our main result shows that with this scaling, the probability of ip in 
a random image converges to a non trivial limit. 

Theorem 2.4 Let ip be an assertion on B(0,r), with black index b(ip) and meaningful 

2 

index e(ip). Let p{n) = cn '(+) , where c is a positive constant. Then 

lim /v P (n)(V0 = 1 - exp(-e(V)c fe W) . (1) 



6 



The reason why such a result is called a Poisson approximation becomes clear if one 
considers the property "there exists a black pixel". Let X n be the total number of black 
pixels. Since all pixels are independent, the random variable X n follows the binomial 
distribution with parameters n 2 and p(n). In particular the probability that there exists a 
black pixel is 

F[X n >0] = l-(l-p(n)r 2 . 

Here the black index is 1 and the threshold function is n~ 2 . Take p(n) = cn~ 2 . Then the 
binomial distribution of X n converges to the Poisson distribution with parameter c, and 
the probability that there exists a black pixel (X n > 0) tends to 1 — exp(— c). 

The situation is not so simple as soon as the black index is larger than 1. Consider for 
instance again the local property ip: "there exist two connected black pixels". We already 
saw that on the ball of radius r = 1, the definition set is composed of all those patterns on 
B(0, 1) whose center is black, and at least one of the 4 neighbors is also black (15 patterns). 
Consider the number of occurrences of any of those patterns, somewhere in the random 
image. It is a sum of Bernoulli random variables. However they are not independent: 
patterns on balls centered at two adjacent pixels have one pixel in common. The same can 
be said of any local property ip: the number of occurrences of ip(x) can be viewed as a sum 
of (dependent) Bernoulli random variables. The sum of a large number of Bernoulli r.v.'s 
converges in distribution to a Poisson distribution, provided the dependencies between the 
variables are not too large. In the theory of random graphs, similar results are frequent 
(see e.g. [19] Lecture 1 p. 296, Lecture 2 p. 303 or Lecture 5 p. 314). 



Proof of Theorem 2.4 There are several ways to prove a Poisson approximation result. We 



chose the famous "moment method" based on the following result ([T], Chapter 1 p. 25). 

Lemma 2.5 Let (X„) n€ N* be a sequence of integer valued, nonnegative random variables 
and X be a strictly positive real. For all n,l 6 N* define the quantity 

E l {X n ) = Y,nX n = k) " 



k>l 



(k-iy. 



If for all I E N* , lim„_ i . 00 Ei(X n ) = \ l then (X n ) converges in distribution to the Poisson 
distribution with parameter X. 

In our case, X n counts the number of occurrences in the random image of some patterns, 
to be precised later. The "moment" E[(X n ) is the expected number of ordered /-tuples of 
occurrences of those patterns. 

Firstly, one should observe that patterns in the definition set of ip cannot be all treated 

2 

equally: since p(n) = an & w , by Lemma [221 any pattern with more than b(ip) black pixels 
has a vanishing probability of being observed. Hence we can reduce the set of patterns to 
those having exactly b(ip) black pixels. In the example of two connected pixels with r — 1, 
V(if)) has 15 different patterns, but only 4 of them have exactly 2 black pixels. 

Now, one has to take care of multiple counts. Among the 4 patterns on 5(0, 1) that have 
2 black pixels, 2 patterns have two horizontal black neighbors, and the 2 other patterns have 
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two vertical black neighbors. Assume the image has only one occurrence of two horizontal 
black neighbors. If we examine all possible pixels x, we will find two adjacent centers for 
which ip(x) is satisfied. In order to obviate this problem, we need to count patterns up 
to possible translations. We say that two patterns with black index b{if>) are equivalent if 
their sets of black pixels are translations of each other. The number of equivalence classes 
is the meaningful index e(ip) of Definition 12.11 (In the example of two connected black 
pixels, there are two equivalence classes: horizontal or vertical neighbors). 
We choose a meaningful set, i.e. we fix a pattern for each equivalence class: 

Do WO = {A, • • .,%} ■ 

The counting variable X n to which Lemma 12.51 will be applied is the total number of 
occurrences of one of the patterns Di, . . . , Defy), m the random image I n ,p(n)'- 

e(V) 
i6S„ i=l 

where I denotes the indicator function of an event. The expectation of X n is 

E(X n ) = n 2 e(i/j) (p(n)) bW (l - p(n)) 2r2+2r+1 - b ^ . 

As n tends to infinity, it tends to e(ip) c b<y ^\ which is the parameter of the Poisson approx- 
imation in formula (QJ). In order to apply Lemma 12.51 to X n , one has to check that the 
hypothesis holds. 

Lemma 2.6 

VIEW, lim Ei{X n ) = (e{i))c m ) 1 . 

n^oo 

The proof of Lemma f2.6l is rather technical and will be given in the Appendix. 

Now Lemma EH implies that X n converges in distribution to the Poisson distribution 
with parameter e(if>)c b( ^>. Therefore fJ»n,p(n)(Xn > 0) tends to 1 — exp(— e(ip)c b ^). It is 
clear that X n > implies that 1 n> %>(n) satisfies ip. Hence p n , P (n)(X n > 0) ^ /i niP ( n )( , 0). 
Moreover, the event (ip \ (X n > 0)) = (tp H (X n = 0)) implies the appearance of a pattern 
with at least b{ip) + 1 black pixels in a ball of radius r, and by Lemma l2~3| its probability 
tends to as n tends to infinity. Therefore, 

lim n n - p ( n )(X n > 0) = lim JW„)$) = 1 - exp(-e(^)c 6(v,) ) . 

n— >oo n— >oo 

It should be noticed that the asymptotics of X n does not depend on the choice of the mean- 
ingful definition set {Di, . . . , D e (»}- It does not depend either on the radius r of the ball. 
Consider for instance the property if) "the image contains two horizontally connected black 
pixels". Its definition set for the ball 5(0, r) has r 2 2 2r +2r elements. Among these, only 
2r 2 have exactly 2 black pixels, and there is only one equivalence class up to translations, 
whatever r. Therefore r is a phantom parameter, as should be expected. It serves only to 
ensure that properties remain local. □ 



8 



3 Application to image denoising 



In the previous section, we computed the asymptotic probability of appearance of any local 
property in a random binary image. This provides the basis of a statistical test to decide 
whether an observed pattern in an image may be due to noise or not, and this test can be 
applied for image denoising. In this section, all the considered images will be corrupted 
by the same kind of noise, namely impulse noise. This type of noise models for example 
the fact that some (unknown) part of the data is lost. We will assume that the probability 
parameter of the noise is known. We will first start with the denoising of binary images, 
and then extend it to grey level images using their threshold decomposition. 



3.1 Binary images 

Let Jo be the original (non degraded) binary image of size n x n. This original image 
Jo is then corrupted by impulse noise, which has a probability parameter p in the white 
components and a probability parameter q in the black ones (see Figure [SI for an example). 
We shall see in the next section why it is important to allow black and white pixels to be 
destroyed with a different probability. Thus the noisy image J is given by 

Wx, I(x) = J (x) ■ (1 - C P (x)) + (1 - /„(*)) ■ ( g (x), (2) 

where the ( p (xys (resp. C 9 (x)'s) are independent Bernoulli random variables with param- 
eter p (resp. q). In other words, we have the following conditional probabilities 

F(I(x) = | I (x) = 1) = p and P(J(x) = 1 | I (x) = 0) = q. 

As can be seen in Figure El the impulse noise creates small black and white connected 
components. These small components will be removed using a statistical decision based on 
their size ("size", in this paper, always means "area"). We are first interested in the black 
connected components (with respect to 4-connectivity). The results of the previous section 
give us the threshold function and also the probability of appearance of such components. 
More precisely, the threshold function for a given (fixed) black component of size k is 
6{n) = n~ 2 l k and its asymptotic appearance probability in a n x n image of noise with 
probability parameter p{n) = c9(n), as n goes to infinity, is equal to 



Now, if we are interested in the appearance of a component of size k (i.e. any of them, 
not only a given one) , Theorem 12.41 claims that the asymptotic (for large n) probability of 
appearance is 



1 - e~ akck 



where is the number of 4-connected components one can make with exactly k pixels (up 
to translations). Writing c k = n 2 p k , we thus have an approximation for the probability 
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of appearance of a component of size k in the n x n image, with a proportion p of black 
pixels. We denote by PA(n, k,p) this approximation : 

PA(n,Jfe,p) = l- e - n2akpk . 

The 4-connected components are known in the combinatorics literature as "square lattice 
animals" or "polyominoes". Counting these objects is a difficult combinatorial problem 
and there is no general expression for a&. However, some asymptotic results are known: a 
concatenation argument (TO) shows that there exists a constant a, called growth constant, 
such that: 

lim (a/.)* = sup(afc)* = a . 

The exact value of a is unknown. Numerical estimates give a ~ 4.06 and the best published 
rigorous bounds for it are 3.9 < a < 4.65 (see PHEIIII])- But thanks to some numerical 
studies 2 , the values of the sequence (ojfc)fc^i are known up to k = 47, which will be enough 
in practice for denoising applications. The first terms are: d\ = 1, a 2 = 2, = 6, 
Gt4 = 19, etc. Furthermore, numerical computations show that for p ^ p ma x — 0.2, one has 
a k+iV ^ a k for k e [1,47], which ensures that PA(n, k + 1,1?) ^ PA(n, k,p). This means 
that the probability of appearance of an animal is a decreasing function of its size. This 
is rather reasonable: for fixed values of p and n, it would not make much sense to keep a 
connected component of size k and to remove one of size k' > k. 

Let us fix a (small) positive real e which will be our risk probability, in the sense of 
statistical testing. If the size k of a connected component observed in a noisy image I is 
such that PA(n, k,p) ^ e, then we will consider that it comes from the original image Jo, 
and keep it. If PA(n, k,p) > e, it will be regarded as noise and removed. Thus the size 
threshold for the components we keep is defined by: 

s(n, p, e) = mf{k ; PA(n, k,p) = 1 — e~ n1 ^ ^ e}. (3) 

A component with size k ^ s(n,p,e) has a very low probability (less than e) of appearing 
in a pure noise image. In Figure [2jb, we plot the size threshold s(n,p,e) as a function of 
the noise probability parameter p, for a fixed value of n = 256 and three different values 
of e: 10-\ 10" 2 and 10" 3 . 

The algorithm for the binary image denoising can be decomposed in four steps: 

1. Extract all the 4-connected black components of the noisy image /. 

2. Remove the ones which have a size less than s(n,p,e) (i.e. change their pixels into 
white). Obtain a new binary image. 

3. Extract all the white 4-connected components of this new image. 

2 for up-to-date information on the topic, see the web-site of the "On-line Encyclopedia of Integer 
Sequences", http://www.research.att.com/~njas/sequences/ and references therein. 
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e=10 



e=10 



s=10 



noise intensity p 



Figure 2: The size threshold s(n,p, e) as a function of the noise probability parameter p G [0,0.18], 
for n = 256 and e = W 1 , 10~ 2 and 10~ 3 . 



4. Remove the ones which have a size less than s(n, q, e) (i.e. change their pixels into 
black), to obtain the final denoised image denoted by / = TI. 

To summarize, this denoising filter T can be written as: 

T = Tt , o T7 . , 

s(n,q,e) s(n,p,e) ' 

where T+ (resp. T~) is the morphological area opening (resp. closing) of size s defined by 
L. Vincent in [20j. See Figure El for an example of the obtained result and for a comparison 
with the results obtained with a more standard binary filter (namely the median filter). 

Before explaining how this method will be extended to grey level images, let us make 
a few general comments. 



The method is valid when p is not too large, since we need akp k to be small. In 
practice, we are limited to p ^ p ma x — 0.2. 

The dependence on e is low since it is in fact a log(e)-dependence. Indeed, l — e~ n2akpk 
is approximately equal to n 2 akP k when the value of this expression is small. If we 
replace by a k , the threshold for the minimal size of the components we keep is 
approximately given by 

log 5 — 2 log n 



s(n,p,e) 



log a + log p 



The boundaries of the remaining components are not smoothed. This comes from the 
fact that when some noise is at the boundary of a component, it becomes part of it. 
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In order to remove it, one would need other a priori knowledge of the original image 
(such as smooth or straight boundaries as in the case of Figure 02). It is actually a 
general problem of image denoising: one has to define some a priori model for the 
image. Here the underlying model is that the original binary image is made of "large" 
(as compared to the noise) black and white connected components. 

• The two filters T~ and T s + do not commute (see Figure EJ). This was already noted 
by Vincent in [20] • One solution he proposed is to use them in alternating sequential 
filters [13 CHI with increasing sizes of area. This may not be a real issue, since he 
also noticed that T~ o T+ and T+ o T~ are visually extremely close (this will be even 
more true for grey level images). Another solution, proposed by Masnou and Morel 
in [12], and then formalized by Monasse [13], is to process simultaneously upper and 
lower level sets. This grain filter denoted G t (where t is the area threshold) is done 
by a pruning of the tree of all level sets, built thanks to the inclusion principle (this 
algorithm which is very fast is called the Fast Level Set Transform p„ 1 ) . 

The fact that the foreground and the background of a binary image are treated 
in a complementary way is a general problem in Mathematical Morphology. Many 
operators are not self-dual, and they often occur pairwise: like dilation/erosion and 
opening/closing for example. It is worth mentioning that in [2], H. Heijmans describes 
a general method to construct morphological operators which are self-dual. 

• It is generally considered that, for consistency reasons, using the 4-connectivity on 
the black (or white) pixels should be followed by using the 8-connectivity for the 
complementary set. From a theoretical point of view, the method we proposed can 
be extended to 8-connectivity in a straightforward way. In order to apply the method, 
one would have to count the number of 8-connected components of size k, which are 
not available in the literature, whereas the (a&)'s are known up to k = 47. Conse- 
quently, for our application to image denoising, we decided to treat the foreground 
and the background in the same way, with 4-connectivity. 

• If the original image J is all white (I = 0) , and if it is corrupted by some noise with 
probability parameter p as described by equation (J2j, we obtain an image I which is 
"pure noise". The probability that it contains a connected component with size larger 
than s(n,p,e) is (by definition of s(n,p,e) and thanks to theorem 12. 4j) less than e. 
Thus, 

nT; [n ^ £) I = I )>l-e, 

which means that, with probability larger than 1 — e, pure noise is completely re- 
moved. 

Thus e represents the "significance level" of our statistical method for denoising: the 
probability of not removing a component coming from the noise is less than e. In 
practice, we generally take e = 10~ 3 or 10~ 2 (the results are visually the same). This 
parameter e is completely independent of the image (which is not the case of the size 
n of the image, or the probability parameter p of the impulse noise): e has to be 
fixed by the user in the same way as the risk level in statistical hypothesis testing. 
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• For reasons of simplicity, the denoising filter T was described in the framework of 
an image of size n x n. The method extends straightforwardly to an image of size 
mxn, where m ^ n, by simply changing the size threshold s(n,p, e) into s(m, n,p, e) 
defined by 

s(m,n,p,e) = inf{fc; PA(^/nm, k,p) = 1 - e~ mnakpk < e}. 



3.2 Grey level images 

Let u be a grey level image, of size nxn and grey level values in the range [0, 255]. Assume 
that this image is corrupted by impulse noise with probability parameter p. This means 
that the observed noisy image v may be written in the form: 

Vx, v(x) = (l-(p(x))-u(x) + ( p (x)-v(x), (4) 

where the C P (:r)'s are independent Bernoulli random variables with parameter p and the 
u(x) 7 s are i.i.d.r.v.'s, uniformly distributed on [0,255]. 

For each level A G [0,255], we can consider the thresholded images u x = I u ^ x and 
v x = I v ^\. The grey level images may then simply be recovered by u = Ylx u >^ an< ^ 
v = v\. The binary noisy image v x is a corrupted version of the binary image u x ; they 
are related by 

F(v x (x) = | u x (x) = l)=px J- and ¥(v x (x) = 1 1 u x (x) =0)=px(l - — ). 

25o 25o 

We are thus back in the framework described for binary images with parameters p x = 
pA/256 and q x — p(l — A/256). The image v x can be denoised following the method 
described in the previous subsection. Finally, we reconstruct a grey level image by simply 
adding the binary ones: v = ^2 x v x . This can be summarized by the formula 

255 A / A \ 
v = Tv = > T~t , o T~, \{v x ), where p x = p and q x — p ( 1 I . (5) 

L^t s(n,q x ,e) s(n,p A ,e)V ^ ^ A 256 \ 256 / 

A=0 V 7 

Figures |U and El give two examples of results obtained by this filtering. 

One natural question that can be asked is whether the filter T defined by formula 
(JSJ) is a morphological filter (see [IE] and [IB] and references therein for the definition 
and properties of morphological filters). Unfortunately, the answer is negative. For two 
grey levels A ^ A', one has v x ^ v X r, and for a fixed area threshold t one would have 
Tf(v x ) ^ Tf(v X t) (because area openings and closings are morphological operators). Now, 
the two thresholds s(n,p x ,e) and s(n,p X f,e) can be different, i.e. s(n,p x ,e) > s(n,p x >,e) 
and thus it is not necessarily true that T~^ npx£ ^(v x ) ^ I^ np)£ j(^). This happens when 
v x and v X i both contain the same small black connected component of size k such that 
s(n,p x ,e) > k > s(n,py,e). However, in the experimental results, we noticed that this 
rarely happens: for most values of A, one has v x ^ £>a-i- 
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In order to illustrate the interest of adapting the area threshold to each grey level, we 
treated the same image using our method, then using a fixed area threshold (for this we 
used the algorithm developed by Monasse in [EH)- The results are those of Figures [HI and 
H3 Figure M shows the result of the usual grain filter, denoted by G t , for two different 
values of the area threshold: t — 10 and t = 20. One can notice that the parameter value 
t = 10 seems too low since there is still some remaining noise (for example on the coat 
of the cameraman). On the other hand the value t = 20 seems too large, since some of 
the original structures have disappeared (for instance the white parabola at the top of the 
building) and still too low (there is some remaining noise on the coat). These results have to 
be compared with the one of Figure El-c. This last figure shows that thanks to the adapted 
area threshold s(n,p\,e) a small white component can be kept and at the same time, a 
larger grey component removed. These results also illustrate what we have proposed in 
this paper, namely an adapted and automatic way to choose the right parameter for the 
area openings and closings. 

3.3 Extension to other noise models 

In the previous subsection, we have explained how the theoretical results of Section [21 
can be used to denoise an image degraded by impulse noise. Now, even if the proposed 
denoising procedure corresponds to an impulse noise model, it is interesting to see how 
it works in the presence of white noise. An example of the obtained results is shown on 
Figure [3 we used again the cameraman image, which is here degraded by white noise with 
standard deviation a = 15. It is then denoised using the filter defined by Equation (JSJ) with 
parameter value p = 0.2 and e = 10~ 2 (on the middle image) and p — 0.1 and e = 10 -2 
(on the right image). The main question is here: how to choose the value of the parameter 
p used in the filter, in relation to the standard deviation a of the white noise ? 

In the case of impulse noise, we were able to relate the size threshold s of the area 
openings and closings to the impulse noise probability parameter p and to the grey level 
A. The main result was then: if we take u = in Equation (JH), then the degraded image 
v is pure impulse noise, and after filtering (Equation ©), we have, by definition of the 
threshold s(n,p, e), that Tv = u with probability larger than 1 — s. 

Now, if we want to obtain in the same way a denoising filter for white noise, we have 
first to be able to relate the size threshold s used for the area openings and closings to the 
standard deviation a of the white noise and to the grey level A of the thresholded image. In 
order to do this, let us consider a pure white noise image w: all the w(x) J s are independent 
identically distributed random variables with distribution J\f(0, a 2 ) (gaussian with mean 
and variance a 2 ). For A G R, let us consider the thresholded image w\ = I w ^\. We then 
have 



This last term, denoted by p\ i<7 should be the analogue of the probability parameter p\ 
defined in the case of impulse noise (Equation JSJ in the previous subsection). Now the 
main difference here is that p\ <(T is not necessarily small (it goes to 1 as A goes to — oo), 
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and the thresholded image w\ may contain arbitrarily large connected components. Thus 
a filter like the area opening or closing will never be able to remove all the noise. The 
problem here is that the type of filter we have considered is not adapted to white noise. 

Generally, when using a probabilistic approach for filtering, one needs a model for the 
image and one for the noise. Here, we do not need a model for the image, since we only 
use an "a contrario" hypothesis. It means that we only need to know that "the image is not 
noise" in the sense that large connected components, which have a very small probability of 
appearing in impulse noise, necessarily belong to the image. This approach does not work 
in the case of white noise since the size of the connected components of level sets is not 
a good way to discriminate white noise from the image (both contain large components). 
Nevertheless, if we are able to find some characteristic geometric features (as the size 
of connected components in the case of impulse noise), the proposed approach could be 
extended to white noise or to other models of noise. 

4 Conclusion 

We have introduced a mathematical model for random images, in which we were able to 
compute the probability of appearance of any "local pattern" (Theorem l2.4jl . This was then 
used to give an explicit formula for the size threshold s(n,p,e), such that the probability 
of appearance of a component of size k s(n,p,e) in a n x n image of pure noise with 
probability parameter p is less than e. Using this value of s(n,p,e) for the area openings 
and closings defined by Vincent will ensure that, with probability larger than 1 — £, pure 
noise is completely removed. This denoising process was then extended to grey level images 
using their threshold decomposition. There, the proposed area threshold depends on both 
the probability parameter p of the impulse noise and the grey level A of the level set. 
Now, some questions remain, that have not been addressed in this paper: if the probability 
parameter p of the impulse noise is unknown, what is the best way to estimate it ? For a 
binary pure noise image, the best estimate of p is simply the ratio of the number of black 
pixels to the area of the image. Then, by analogy, a first answer for binary images (like 
the chessboard for example) is to compute the relative number of black pixels outside a 
dilation of the "large" black components. Thus, for a grey level image, it is possible to 
use the threshold decomposition to obtain initial estimates of p\ = p x A/256 and then to 
estimate p using, for example, a linear regression. Now, it is not clear that this estimate 
will be a good one since natural grey level images often contain textures creating small 
components which over-estimate p. In order to obtain a reliable estimate of p, it would 
be necessary to use also some information extracted from the statistical moments (like the 
covariance, three-point probability, etc. . . ) measured on the image. 
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A Appendix 



Proof of Lemma \2.(\ 

Fix I G N*. Recall that X n counts the number of occurrences of the meaningful patterns 

_ _ 2 

D\, . . . , in the random image 1 n ,p{n) where p(n) = cn b( » . We are interested in: 



E l (X n ) = J2nXn = k) — 



(k-iy. 



We need to prove that Ei(X n ) tends to (e(ip)c b ^) 1 as n tends to infinity. 

One can see Ei(X n ) as the average number of ordered Z-tuples of copies of the patterns 

Di, . . . , -D e (» in T ni p{ n ). Thus, we can write: 



Ei{X n ) 



E 



^ ^ l D n {x 1 )A...AD Jl {xi){^n,p{n) 

Xl x t l<y lv ..,j^ e (</,) 



J 



= Yl Yl Yl f"n,p(n)(Dj 1 (x 1 ) A . . . A Dj t (xi)) , 

8=1 (xi,...,Xt) 

eC(s) <e(V0 

J, C(s) represents the set of I— tuples (xi,...,xi) of pixels in 



where, for s — 1, 

such that the set {B(xi, r), . . . , B(xi, r)} is composed of s equivalence classes for the 
4— connectivity relation. 

The term corresponding to s = I in the last sum will be denoted by E l (X n ) and the rest 
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by E'i (X n ). The quantity E[(X n ) can be seen as the average number of ordered /-tuples 
of copies of Di, . . . , D e ^, on non-overlapping balls. We will first show that: 



hm E[{X n ) = (eWc^) 1 . (6) 

n— >oo 

Then we will prove that E l (X n ) tends to as n tends to infinity. 

We want to choose I pixels x\, . . . , xi such that the balls of radius r centered on those 
pixels are two by two disjoint. For the first pixel x±, there are n 2 possibilities. Let 2 $C j ^ I 
and suppose pixels xi, . . . , Xj-\ have been chosen. For the j-th choice, the set of all pixels 
x such that d(x,Xk) ^ 2r for some 1 ^ k ^ j — 1, must be avoided. The cardinality of 
this set is bounded by (j — 1) x (8r 2 +4r + 1) whatever X\, . . . , Xj_i. This bound does not 
depend on n. So, asymptotically the number of choices for the j— th element is n 2 , and 
consequently the cardinality of C(l) is equivalent to n 21 . On the other hand, if two balls 
B(x,r) and B(x',r) are disjoint, then for all 1 ^ j,j' ^ e(ip), the random variables Id.( x ) 
and Ijj ,(3.1) are independent. Therefore, we obtain the first limit (relation (jHJ)): 

E[{X n ) ~ n 2 \e^)p(n) b ^\l - p(n)) 2r2+2r+1 ~ b ^Y ~ (e(ip)^) 1 . 

The factor e(^) z comes from the choice of the e( , 0) patterns Z?i, . . . , D e ^ for the Z chosen 
balls. 

There remains to prove that Ei(X n ) tends to as n tends to infinity. The intuition 
is that if two patterns occur in overlapping balls, then locally more than b{ijj) black pixels 
are present in a ball of radius 2r. This has vanishing probability, by Lemma ESI 
Let 1 ^ s ^ I — 1 and (xi, . . . , x{) be an element of C(s). Let Ci, . . . , C s represent the 
connected components of the set \J l k=1 B(xk, r). Then by independence between them (they 
concern disjoint pixel sets): 

s 

^ n ,p{n){.DjAxi) A ... A DjXxi)) = Yl Vn, P {n)( f\ D jk (x k )) . 

m=l k;B(x k ,r)eC m 

As a consequence of s ^ I — 1, there exists at least one connected component, say Cx, 
having at least two elements. Since the black pixel sets of two different patterns of T> (tp) 
cannot be translated of each other, there must be at least b(if)) + 1 black pixels in C\. Thus 
we have 

/V P (n)( A D jk (x k ))^p(n) b ^ +1 . 

k;B(x k ,r)eCi 

For the other connected components, we simply bound 

Vn,p(n)( f\ D jk (x k )) ^ Vn,p(n)(D jkrn (x km )) ^ p{n) b W , 
k;B(x k ,r)eC m 

for any index k m such that B(x km ,r) G C m . Therefore, we obtain the following result: 

/Jn, P (n)(A,(xi) A ... A < p(n) sbW+1 . 
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Finally, the set C(s) only has 0{n 2s ) elements and the number of ways to choose / elements 
among D ± , . . . , -D e (v>) does not depend on n. Consequently, the desired result follows: 

1-1 1-1 
E l (X n ) ^ 2^0(n 2s x n = 2^0(n~^)) = o(l) . 

s=l s=l 

□ 
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Figure 3: First row: on the left, the original binary image Iq of size 256 x 256; on the right, the 
corrupted image I. White pixels have been changed with probability p = 0.1 and black ones with 
probability q = 0.2. Second row: on the left, the result of the denoising algorithm with e = 10~ 2 
when removing first black components and then white ones (i. e. applying T^t qe \ T~/ n p £ J; 
on the right, denoising by first removing white components and then black ones (i.e. applying 
Tl s o T + , J. The two imaqes are not the same, illustratinq the fact that the two operators 

s(n,p,e) s(n,q,e) / 3 ; a j c 

T~ and T A + do not commute. However in both cases, small black and white components due to 
noise have been removed. Only the boundaries of the remaining ones are different. Third row: 
results obtained when applying median filtering with a disk of radius r = 2 (on the left) and a 
disk of radius r = 5 (on the right). The value of the parameter r = 2 seems too small since some 
noise is still present in the black components. Now, for the value r = 5, one can notice that the 
black corners have been eroded. The reason for this is that, in the noisy image, the probability 
parameters p and q (used to corrupt respectively the white and black pixels) were such that q > p. 
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Figure 4: Left: image v obtained with impulse noise with probability parameter p = 0.15 on the 
Lena image. Middle: thresholded image v\ for the grey level A = 150. Right: denoised image v 
obtained by the noise-adapted grain filter T with e = 10~ 3 . 
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Figure 5: From left to right, top to bottom: (a) the original cameraman image u (size 256 x 256); 
(b) degraded image v, with impulse noise probability parameter p = 0.2; (c) filtered image v, 
obtained with e = 10 -3 ; (d) image of the difference u — v. It shows that most of the noise has been 
removed, except at the boundaries of the objects and also in the grass texture. 
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Figure 6: Result of the filtering of the noisy image v with the usual grain filter Gt with area 
threshold t = 10 on the left and t = 20 on the right. 




Figure 7: From left to right: (a) the cameraman image degraded by white noise with standard 
deviation a = 15; (b) denoising of the previous image by the filter defined by Equation with 
parameter values p = 0.2 and e = 10~ 2 ; (c) denoising by the same filter with parameter values 
p = 0.1 and e = 10~ 2 . 
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